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We introduce a new technique to generate scattering amplitudes at one loop. Traditional tree algo- 
rithms, which handle diagrams with fixed momenta, are promoted to generators of loop-momentum 
polynomials that we call open loops. Combining open loops with tensor-integral and OPP reduction 
results in a fully fiexible, very fast, and numerically stable one-loop generator. As demonstrated 
with non-trivial applications, the open- loop approach will permit to obtain precise predictions for a 
very wide range of collider processes. 
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Theoretical simulations of scattering processes play a 
key role for the interpretation of data collected at the 
Large Hadron Collider (LHC). Whenever theory predic- 
tions are used to link data to model parameters, or to 
separate signals from backgrounds, perturbative calcu- 
lations beyond leading order (LO) are indispensable, in 
order to reduce theoretical errors and to quantify them in 
a reliable way. The vast physics programme of the LHC 
requires next-to-leading-order (NLO) predictions for a 
large variety of processes and theoretical models. In this 
context, the fairly large particle multiplicities resulting 
from the high collider energy can lead to one-loop ampli- 
tudes of unmanageable complexity. Handling 2^-4 pro- 
cesses with traditional one-loop techniques yields severe 
numerical instabilities and gigantic algebraic expressions, 
and can require huge CPU and human power. 

The importance of these challenges, marked by the 
creation of the 2005 Les Houches priority list trig- 
gered a series of recent theoretical developments that 
led to the completion of various multi-particle NLO cal- 
culations [il]. By using tensor-integral reduction and 
Feynman diagrams, it became possible to handle multi- 
particle processes with high efficiency and numerical sta- 
bility [^Q- Alternatively, new reductions of on-shell type 
were introduced that avoid tensor integrals and re- 
duce all process-dependent aspects of one-loop calcula- 
tions to a LO problem. In this framework, the Ossola- 
Papadopoulos-Pittau (OPP) technique Q led to the de- 
velopment of highly automatic NLO generators 

One of the features emerging from first LHC applica- 
tions is a trade-off between CPU efficiency and automa- 
tion. While the tensor-reduction approach leads to the 
fastest numerical codes [1, |3| i at present its large-scale 
applicability is limited by the occurrence of very large al- 
gebraic expressions. In contrast, the higher flexibility of 
the current OPP-based codes |8l-[l0| comes at the price 
of a lower CPU efficiency. This motivates us to intro- 
duce a new one-loop algorithm that naturally adapts to 
tensor-integral and OPP reduction and maximises speed 
and flexibility in a way that does not depend on the 
employed reduction. Inspired by the observation that 
colour-ordered multi-gluon amplitudes can be efficiently 



computed by combining tensor integrals with a one-loop 
Dyson- Schwinger recursion ll|, we formulate a numeri- 



cal algorithm that generates one-loop amplitudes via re- 
cursive construction of Feynman diagrams. As outlined 
in the following, the method is fully general, and first 
non-trivial applications demonstrate its high efficiency, 
when combined with both tensor-integral or OPP reduc- 
tion. 

Leading-order transition amplitudes M and virtual 
NLO corrections 6A4 are handled as sums of tree and 
one-loop Feynman diagrams, 

m = ^m(''', sm = Y,3^'^'^^- (1) 
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The corresponding scattering probability densities W 
and virtual one-loop corrections SW are 

W= l-^l^' 2Re(7W*(5>[). (2) 
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The sums run over colour and helicity states of each ex- 
ternal particle. Colour sums are performed at zero cost 
by exploiting the factorisation of individual diagrams 
into colour factors C''^-' and colour-stripped amplitudes 

j^id) ^ C^d}j^{d)^ g^id) ^ C^d}gj^{d}^ (-3-) 

Algebraic reduction of the colour factors to a standard 
basis {Ci} permits to encode all colour sums in the ma- 
trix ICij = y^^„] C*Cj , which is computed only once per 
process (see for details). 

Colour-stripped tree diagrams A^'^^ are computed by 
a numerical algorithm that recursively merges sub-trees. 
We call a sub-tree a subdiagram obtained by cutting a 
tree. Sub-tree amplitudes are complex n-tuples w^{i), 
where /3 is the spinor or Lorentz index of the cut line. The 
label i represents the topology, momentum and particle 
content of the sub-tree. Sub-trees are recursively merged 
by connecting their cut lines to vertices and propagators: 




(4) 
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The sub-trees i, j, and k involve off-shell momenta, but 
in contrast to off-shell currents they represent individual 
topologies. Cut lines are marked by dots, and external 
lines are not depicted. For brevity, quartic vertices are 
not shown explicitly, but their inclusion is straightfor- 
ward. In terms of n-tuples, the recursion step reads 



le 



(5) 



ie) describes a vertex connect- 



where X^g/ {pf - n 
ing i, j, k, and a propagator attached to i. The re- 
cursion starts with the external lines of a tree, i. e. the 
wave functions of the scattering particles, and terminates 
when the generated sub-trees permit to build all tree di- 
agrams. The algorithm is based on numerical routines 
that implement all wave functions, propagators and ver- 
tices. These building blocks depend only on the theo- 
retical model and are easily obtained from its Feynman 
rules. This approach is similar to the tree algorithm im- 
plemented in MadGraph [3]. Its strength lies in the 
efficiency of colour sums and the systematic recycling of 
sub-trees appearing in different diagrams. 

Let us now consider one-loop amplitudes. A colour- 
stripped n-point loop diagram is an ordered set of n sub- 
trees, X„ = {ii, . . . , in}, connected by loop propagators: 



d^gAA(Z„;g) 
DaDi...Dn-i 




The ordering {ii, . . . , i„} of the external sub-trees in ([6]) 
describes the topology of this particular one-loop Feyn- 
man diagram, independently of the coloured or colour- 
less nature of the external particles. Since we do not 
apply any ordering selection, like e. g. colour ordering, 
the full set of one-loop diagrams includes all orderings 
(topologies) that are allowed by the Feynman rules. The 
denominators — {q + Pi)'^ — rnf + ie depend on the 
loop momentum q, external momenta pi, and internal 
masses m^. All other contributions from loop propaga- 
tors, vertices, and external sub-trees are summarised in 
the numerator, which is a polynomial of degree R < n in 
the loop momentum. 



R 



(7) 



Momentum-shift ambiguities are eliminated by setting 
Po = 0. This singles out the Dq propagator, and the loop 
momentum q flowing through this propagator is marked 
by an arrow in ((B]). In traditional one-loop calculations, 
the coefhcients Affi-^...^^^ are explicitly constructed from 



the Feynman rules, and the amplitude @ is expressed 
as a linear combination 
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of tensor integrals 



'qq^ 



DoDi...Dn-l 
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These latter are subsequently reduced to m-point scalar 
integrals Tm.o with m = 1,2,3,4, which originate 
from ^ by cancelling the numerator and at least n — 4 
denominators Di. Alternatively, the OFF method Q 
permits to by-pass tensor integrals through a direct con- 
nection between the numerator M(Tn ; q) and the scalar- 
integral representation of the amplitude. To this end, the 
numerator is expressed as a polynomial in the denomina- 
tors Di. The scalar- integral coefficients are determined 
by evaluating A/'(X„ ; q) at loop momenta q that satisfy 
multiple-cut conditions of type Di = Dj = . . . = 0. 

In this framework, the numerator can be computed 
with tree-level techniques. Let us consider the cut loop 
that results from ([6]) by cutting the Dq propagator and 
removing denominators. 



A/'f(ln;g) 





The indices a and /3 are associated with the arrows that 
mark the ends of the cut line, and the trace of the cut 
loop corresponds to the numerator 7V(I„; q). As depicted 
in (|T0)) . ri-point cut loops can be constructed by recur- 
sively merging lower-point cut loops and sub-trees. More 
explicitly, 

AAf (I„;g) = X^,(X„,i„,Z„_i) MZ{In-i\q) (H) 

where Xi^^ and w are the same vertices and sub-trees 
that enter the tree algorithm. It is thus possible, within 
the OFF framework, to reduce the calculation of scalar- 
integral coefficients to a tree-level problem. Highly au- 
tomatic tree generators can be upgraded to loop gener- 
ators [1, , thereby reducing the human power needed 
for NLO calculations by orders of magnitude. However, 
when applied to non-trivial processes, this approach can 
require massive computing resources. The reason is that 
OFF reduction requires repeated evaluations of 7V(I„ ; q) 
for a large number of g-momenta. 

This is related to the nature of loop calculations, which 
requires the knowledge of the numerators as functions of 
the loop momentum q. It is thus natural to introduce a 
new kind of loop-generator algorithm, where the building 
blocks of the recursion ([TT|) are handled as functions of q. 
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To this end, we express the cut loop dTO]) as a polynomial 
AAf (I„;g) = ^<...^^;„(X„) q>^^...q^^. (12) 

To emphasise the loop-momentum dependence encoded 
in the set of coefficients A/'^^ ^^. „(!„), we call this repre- 
sentation an open loop. In renormalisable gauge theories, 
splitting the X tensor in (|lip into a constant and a linear 
part, 



(13) 



we obtain recursion relations for n-point open loops in 
terms of lower-point open loops and sub-trees: 



-<,...^.;a(In) 



>;^-A/';,...M,.;a(^n-i) 
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The number of coefficients grows with the polynomial de- 
gree, which corresponds to the tensorial rank r. However, 
symmetrising open-loop tensorial indices fii . . . H r keeps 



the number of components well under control 11[. Once 
the coefficients are known, multiple evaluations of the 
polynomial ([7]) can be performed at a negligible CPU 
cost This strongly boosts OPP reduction. More- 

over, the same coefficients can be used for a tensor- 
integral representation of the loop amplitude ([8]). Open 
loops can thus be interfaced with both OPP and tensor- 
integral reduction in a natural way. 

The efficiency of the open-loop recursion is further in- 
creased by means of relations that arise from pinching 
loop propagators. Let us consider the parent (n-point) 
and child {{n — l)-point) diagrams in Fig.[Tl where the 





FIG. 1: Parent (left) and child (right) open loops. 

child results from pinching the Dn-i propagator of the 
parent. It is evident that the parent can be constructed 
by recycling the In-2 part of the child. But this requires 
that parent and child are cut as in Fig. [1] To this end we 
order the external sub-trees using a function ik — >■ S{ik) 
that fuffills S{ik) > 0; S{ik) ^ if ik and ii contain 
different external legs; S{ik © ii) > max{iS(ifc), 5(i;)} 
where u- © ii is the merged sub-tree resulting from ik and 
ii. The position and direction of the cut are determined 
by selecting contiguous sub-trees ii and i„ with 

S{ik)>Sili) V fc>l, S{ln)>Sil2). (15) 



This guarantees that parent and child diagrams are cut 
as in Fig. [1] so that each parent can be constructed from 
the In-2 part of a previously computed child. 

The possibility of highly efficient helicity sums is an- 
other key feature of open loops. Unpolarised transition 
probabilities require multiple evaluations of the polarised 
amplitudes ([6]). The number of helicity configurations 
grows exponentially with the particle multiplicity, and 
the resulting CPU cost can be very large. This can be 
avoided by exploiting the decomposition (|8]) into helicity- 
dependent coefficients N^^...^^ and helicity- independent 
tensor integrals. The CPU expensive evaluation of tensor 
integrals ([9]) is performed only once, and helicity sums — 
when restricted to the coefficients — become very fast. 
More explicitly, the contribution of ([8]) to the unpolarised 
transition probability is handled as a linear combination 



<5W('') = Re 



(16) 



with helicity- and colour-summed coefficients 
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(^n). (17) 



The unpolarised representation (1161) can be reduced 
to scalar integrals with any method, including OPP. 
Within the OPP framework, the reduction is performed 
by starting from the unpolarised numerator function 



. q'^'' ; in this way open 



<5>vW(I„;g)=E.'5>V^?..^,Q''i 
loops lead to extremely fast helicity sums as compared to 
traditional tree generators. The OPP reduction is further 
improved by combining sets of loop diagrams with iden- 
tical loop propagators but different external sub-trees. 

As a proof of concept, we realised a fully automatic 
generator of QCD corrections to Standard-Model pro- 
cesses. Diagrams are generated with FeynArts isf : 
sub-tree and open-loop topologies are processed by a 
Mathematica program, which concatenates them in a 
recursive way, reduces colour factors, and returns For- 
tran 90 code. The reduction to scalar integrals is per- 
formed in terms of tensor integrals and, alternatively, 
with the OPP method. For tensor integrals we use Col- 
lier, a private library by A. Denner and S. Dittmaier, 
which implements the scalar integrals of Ref. [3l and 
reduction methods that avoid instabilities from spurious 
singularities [13] ■ OPP reduction is performed with CUT- 
TOOLS [13) and, alternatively, with Samurai [l^. Ul- 
traviolet and infrared divergences are dimensionally regu- 
larised. While loop denominators are consistently treated 
in = 4 — 2e dimensions, the momenta and the co- 
efficients 7V^i...p,, in dZ])-® are handled in Z? = 4. Their 
Z? — 4 dimensional contributions, which yield so-called 
i?2 rational terms, are restored via process-independent 
counterterms [13] using the tree generator. 

To assess flexibility and performance of the method, we 
considered the 2^2,3,4 reactions uii W+ W^ -I- ng, 
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FIG. 2: CPU cost of colour and helicity summed one-loop 
probabilities SW versus number of diagrams. Runtimes per 
phase space point, with tensor-integral (tTi) and OPP reduc- 
tion (topp), on a single Intel 15-750 core with ifort 10.1. 




FIG. 3: Accuracy of 5VV using tensor reduction in dou- 
ble precision. The probability of accuracy worse than A, in 
samples of 10^ uniformly distributed phase-space points with 
= 1 TeV, pT > 50 GeV, ARij > 0.5, is plotted versus A. 



ud W+g + ng, vlVl tt 4- ng, and gg -> tt + ng, with 
n = 0, 1, 2 gluons. This covers all non-trivial processes of 
the Les Houches priority list [l] . The open-loop approach 
leads to compact codes and fast code generation. For 
instance — as compared to Ref. Q — the numerical code 
for gg — > W"'"W~bb becomes two orders of magnitude 
smaller, and its generation time goes down from more 
than 1 week to 4 minutes. Also the CPU speed of open 
loops, when compared against the high performance of 
Refs. [1, reveals a further improvement. The CPU 
cost of one-loop scattering probabilities is plotted ver- 
sus the number of diagrams in Fig. [21 Sums over colours 
and helicities are always included. For W bosons and 
top quarks, assuming decays into massless left-handed 
fermions, we include a single helicity. For the 12 con- 
sidered processes, involving 0(10) to 0(10'') diagrams, 
the CPU cost scales almost linearly with the number of 
diagrams. This unexpected feature indicates that the 
increase of tensorial rank does not represent an addi- 
tional penalty at large particle multiplicity. With tensor- 
integral reduction (upper frame) , the runtime per phase- 
space point is typically below 1 ms for 2 — 2 processes; 
for the most involved 2 — > 4 process it never exceeds 
one second. The ratio of timings obtained with CuT- 
TOOLS and tensor integrals (lower frame) shows that, 
when combined with open loops, OPP reduction permits 
to achieve similarly high speed. While always slightly 
lower, the relative OPP efficiency seems to improve with 
particle multiplicity. This holds also for Samurai. 

The correctness of the results is verified by compar- 
ing tensor-integral versus OPP reductions, and checking 
ultraviolet and infrared cancellations. To assess numer- 
ical instabilities, we surveyed the dimensional scaling of 
probability densities, SW £_^SW, with respect to ^- 
variations of mass units. Results obtained with tensor 
integrals for the 12 considered processes are shown in 
Fig. 131 In samples of 10^ phase space points, the av- 



erage number of correct digits for SW ranges from 11 
to 15. For the most involved processes, precision lower 
than 10~^ and 10^"^ occurs with less than 2 and 0.1 per- 
mille probability, respectively. This demonstrates the ro- 
bustness of the tensor-reduction approach in double 
precision. In contrast, with OPP reduction, a small but 
non-negligible fraction of points are not sufficiently stable 
in double precision. A detailed discussion of this aspect, 
including possible use of quadruple precision or numerical 
interpolation, is deferred to a forthcoming paper. 

In summary, promoting tree generators to open-loop 
algorithms, we developed a fully flexible, very fast, and 
numerically stable technique to generate one-loop cor- 
rections. Loop momenta are separated from colour and 
helicity structures in a way that naturally adapts to 
tensor-integral and OPP reduction, yielding excellent 
CPU speed with both reductions. Open loops have the 
potential to address a very wide range of problems at 
high-energy colliders, ranging from 2 — 2 scattering to 
multi-particle processes with up to 0(10^) diagrams. 
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